First off, installing on windows is a bit tricker than other operating systems. I'm assuming you already have ArcGIS/arcpy installed. We have to manually seak out and install the other dependencies...

Grab the following packages and install them normally

If you're using ArcGIS 10+, you probably have a system-wide python installation at C:\\Python27\\ArcGIS10\\. If there are other versions of python in your registry, you'll want to point the python installers here. Otherwise, it should be picked up as the default.

Now we're ready to use the rasterstats module alongside arcpy.


In [4]:
from rasterstats import raster_stats
import arcpy
arcpy.env = 'E:\\workspace\\rasterstats_blog'
states = 'E:\\workspace\\rasterstats_blog\\boundaries_contus.shp'
precip = 'E:\\workspace\\rasterstats_blog\\NA_Annual_Precipitation_GRID\\NA_Annual_Precipitation\\data\\na_anprecip\\hdr.adf'

One technique would be to use the arcpy SearchCursor to iterate through the features. This allows us to catch errors that result from ESRI's slightly broken implementation of the __geo_interface__


In [9]:
def get_stats(shp):
    cursor = arcpy.SearchCursor(shp)

    stats = []
    for feature in cursor:
        geom = feature.Shape
    
        try:
            rain_stats = raster_stats(geom, precip, stats="*")[0]
        except TypeError:
            # arcpy's geo_interface is broken; reports type=polygon for some multipolygons
            # fall back to WKT
            rain_stats = raster_stats(geom.WKT, precip, stats="*")[0]
    
        #rain_stats['NAME'] = feature.NAME
        #rain_stats['__fid__'] = feature.FID
        stats.append(rain_stats)
    return stats

%timeit stats = get_stats(states)


1 loops, best of 3: 57.9 s per loop

In [6]:
print [x for x in stats if x['NAME'] == "Oregon"][0]


{'count': 250510, 'std': 631.539502512283, 'minority': 3193, 'min': 205.0, 'max': 3193.0, 'sum': 195203001.0, 'median': 461.0, 'majority': 263, 'range': 2988.0, 'NAME': u'Oregon', 'unique': 2865, '__fid__': 35, 'mean': 779.2223903237395}

In [7]:
# Try it the straight rasterstats way
%timeit stats = raster_stats(states, precip, stats="*")


1 loops, best of 3: 8.25 s per loop

In [8]:
print [x for x in stats if x['NAME'] == "Oregon"][0]  # same as before


{'count': 250510, 'std': 631.539502512283, 'minority': 3193, 'min': 205.0, 'max': 3193.0, 'sum': 195203001.0, 'median': 461.0, 'majority': 263, 'range': 2988.0, 'NAME': u'Oregon', 'unique': 2865, '__fid__': 35, 'mean': 779.2223903237395}

In [ ]: